Summary: PyTorch uses modules to represent neural networks with learnable parameters for optimization. Modules can be interconnected to create complex neural networks, and hooks can be added for custom computations during training. PyTorch’s autograd system handles the backward pass for gradient computation, simplifying the training process.
Modules make it simple to specify learnable parameters for PyTorch’s Optimizers to update. (View Highlight)
Modules are straightforward to save and restore, transfer between CPU / GPU / TPU devices, prune, quantize, and more. (View Highlight)
the module itself is callable, and that calling it invokes its forward() function (View Highlight)
The “backward pass” computes gradients of module outputs with respect to its inputs, which can be used for “training” parameters through gradient descent methods. (View Highlight)
PyTorch’s autograd system automatically takes care of this backward pass computation, so it is not required to manually implement a backward() function for each module. (View Highlight)
Sequential automatically feeds the output of the first MyLinear module as input into the ReLU, and the output of that as input into the second MyLinear module. (View Highlight)
it is recommended to define a custom module for anything beyond the simplest use cases, as this gives full flexibility on how submodules are used for a module’s computation. (View Highlight)
Immediate children of a module can be iterated through via a call to children() or named_children() (View Highlight)
Note: It’s not clear from the output if the modules are discovered when they are defined inside of __init__, or if they are discovered by parsing the forward method. I suspect the former, in which case the order in which they are declared matters for the order of their output.
New highlights added June 28, 2024 at 1:08 PM
ModuleList and ModuleDict modules are useful here; they register submodules from a list or dict: (View Highlight)
calls to parameters() and named_parameters() will recursively include child parameters, (View Highlight)
Note: This causes the value of mean to be saved in serialized instances of the object. It will also be retained when using the .to method. If persistent=False is passed, the state will still move devices, but it will not be serialized.
Note: Hooks are callbacks that can be registered to occur during the forward or backward pass. There are separate hooks for before and after each pass.